ReNoun: Fact Extraction for Nominal Attributes
نویسندگان
چکیده
Search engines are increasingly relying on large knowledge bases of facts to provide direct answers to users’ queries. However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction tries to address this challenge, but typically assumes that facts are expressed with verb phrases, and therefore has had difficulty extracting facts for noun-based relations. We describe ReNoun, an open information extraction system that complements previous efforts by focusing on nominal attributes and on the long tail. ReNoun’s approach is based on leveraging a large ontology of noun attributes mined from a text corpus and from user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that the facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored. We describe experiments that show that we extract facts with high precision and for attributes that cannot be extracted with verb-based techniques.
منابع مشابه
Structural, Transitive and Latent Models for Biographic Fact Extraction
This paper presents six novel approaches to biographic fact extraction that model structural, transitive and latent properties of biographical data. The ensemble of these proposed models substantially outperforms standard pattern-based biographic fact extraction methods and performance is further improved by modeling inter-attribute correlations and distributions over functions of attributes, a...
متن کاملBoolean Reasoning for Feature Extraction Problems
We recall several applications of Boolean reasoning for feature extraction and we propose an approach based on Boolean reasoning for new feature extraction from data tables with symbolic (nominal, qualitative) attributes. New features are of the form a 2 V , where V Va and Va is the set of values of attribute a. We emphasize that Boolean reasoning is also a good framework for complexity analysi...
متن کاملTowards a General Technique for Transformation of Nominal Features into Numeric Features in Supervised Learning
Almost all of the machine learning problems require data preprocessing. This stage is especially important for problems where the datasets contain features of mixed types (i.e. nominal and numeric). An often practice in such cases is to transform each nominal features into many dummy (i.e. binary) features. Also many classification algorithms have preference of numeric attributes over nominal a...
متن کاملOrdinal Measurement for Decision Aid: a Conceptual Framework an Research Agenda
It happens more and more often in decision aiding situations to be faced with ordinal or nominal information concerning the alternatives that are considered by a client/decision maker (hereafter we will use the term decision maker DM). By the term ordinal or nominal information we intend the fact that evaluation on attributes, descriptors, indexes, criteria etc. may be expressed on ordinal or n...
متن کاملAnonymization of nominal data based on semantic marginality
Nominal attributes are very common in data sets about individuals, specifically medical data like patient healthcare records. Attributes of this type tend to be sensitive due to their personal nature. If public-use data sets need to be released, e.g. for clinical research purposes, data should be first anonymized. However, since most anonymization methods omit data semantics when dealing with n...
متن کامل